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Abstract 

Standard  classification  algorithms  aim  to  minimize  the  probability  of  making  an 
incorrect  classification.  In  many  important  applications,  however,  some  kinds  of  errors 
are  more  important  than  others.  In  this  report  we  review  cost-sensitive  extensions  of 
standard  support  vector  machines  (SVMs).  In  particular,  we  describe  cost-sensitive 
extensions  of  the  C-SVM  and  the  i/-SVM,  which  we  denote  the  2C'-SVM  and  2jz-SVM 
respectively.  The  C-SVM  and  the  jz-SVM  are  known  to  be  closely  related,  and  we 
prove  that  the  2C'-SVM  and  2i/-SVM  share  a  similar  relationship.  This  demonstrates 
that  the  2C'-SVM  and  2zz-SVM  explore  the  same  space  of  possible  classifiers,  and  gives 
us  a  clear  understanding  of  the  parameter  space  for  both  versions. 


1  Introduction 

In  a  standard  classification  problem  the  goal  is  to  minimize  the  probability  of  making  an 
error.  In  many  important  applications,  however,  some  kinds  of  errors  are  more  important 
than  others.  In  tumor  classification,  for  example,  the  impact  of  mistakenly  classifying  a 
benign  tumor  as  malignant  is  much  less  than  that  of  the  opposite  mistake.  However,  nearly 
all  work  on  classihcation  to  date  optimizes  a  “probability  of  error”  criterion.  An  exception  is 
a  recent  body  of  work  known  as  “cost-sensitive  classihcation”  that  assigns  costs  to  different 
errors  and  attempts  to  minimize  the  expected  misclassihcation  cost. 

Support  vector  machines  (SVMs)  can  be  extended  to  the  cost-sensitive  setting  by  in¬ 
troducing  an  additional  parameter  that  penalizes  the  errors  asymmetrically.  This  approach 

*Supported  by  NSF,  AFOSR,  ONR,  and  the  Texas  Instruments  Leadership  University  Program. 

Email:  md@rice.edu 
Web:  dsp.rice.edu 


1 


Report  Documentation  Page 

Form  Approved 

0MB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  0MB  control  number. 

1.  REPORT  DATE 

DEC  2005  2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2005  to  00-00-2005 

4.  TITLE  AND  SUBTITLE 

The  2v-SVM:  A  Cost-Sensitive  Extension  of  the  v-SVM 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Rice  University, Department  of  Electrical  and  Computer 

Engineering, Houston, TX, 77005 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

see  report 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

_ _ _  ABSTRACT 

18.  NUMBER  19a.  NAME  OF 

OF  PAGES  RESPONSIBLE  PERSON 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE  Same  aS 

unclassified  unclassified  unclassified  Report  (SAR) 

15 

Standard  Form  298  (Rev.  8-98} 

Prescribed  by  ANSI  Std  Z39-18 


has  been  taken  by  several  authors  to  adapt  the  C-SVM  to  be  cost-sensitive  [1-4],  but  this 
strategy  also  applies  to  an  alternative  SVM  formulation,  the  Z/-SVM  [5].  We  refer  to  the 
cost-sensitive  extensions  as  the  2C'-SVM  and  2z/-SVM  respectively.  The  primary  motivation 
for  these  methods  is  to  address  the  problem  described  above,  but  they  can  also  be  applied 
to  deal  with  the  difficulties  that  arise  when  the  class  frequencies  in  the  training  data  do 
not  accurately  reflect  the  true  prior  probabilities  of  the  classes.  Additionally,  cost-sensitive 
classihers  are  useful  in  the  Neyman-Pearson  classihcation  context  [6].  In  all  of  these  settings, 
a  critical  problem  is  that  of  parameter  selection:  the  parameter  settings  that  would  result  in 
the  “best”  performance  are  not  known,  and  so  the  user  must  use  the  training  data  to  estimate 
appropriate  values  for  the  parameters.  Thus,  it  is  vital  that  we  understand  how  varying  the 
parameters  of  either  the  2C'-SVM  or  the  2z/-SVM  will  impact  the  resulting  classifier. 

In  Section  2  we  briefly  review  SVMs.  In  Section  3  we  then  introduce  the  cost-sensitive 
extensions  of  the  C-SVM  and  Z/-SVM.  The  Z/-SVM  has  some  properties  that  make  it  more 
attractive  than  the  C-SVM.  This  is  also  the  case  for  the  2z/-SVM.  We  describe  these  prop¬ 
erties  in  Section  4.  Among  the  contributions  of  this  paper  is  a  proof  that  the  2z/-SVM  is 
feasible  if  and  only  if  the  parameters  lie  in  a  specihed  range.  In  Section  5  we  show  the  2C- 
SVM  and  the  2z/-SVM  are  closely  related.  Specihcally,  we  generalize  a  result  of  [7]  and  show 
that  under  certain  technical  conditions,  any  optimal  solution  for  one  of  the  cost-sensitive 
SVM  formulations  is  an  optimal  solution  of  the  other  with  the  right  parameter  settings. 
Using  these  results,  we  then  prove  a  theorem  that  precisely  relates  the  parameter  spaces  and 
resulting  classihers  of  the  2U-SVM  and  the  2z/-SVM. 

2  Review  of  Support  Vector  Machines 

Support  vector  machines  (SVMs)  are  among  the  more  effective  methods  for  classihcation. 
For  a  more  thorough  review  see  [8-10].  In  the  following,  assume  that  we  have  access  to 
training  data  (xj,|/j),  i  =  l,...,n  where  Xj  G  is  a  d-dimensional  feature  vector  and 
Hi  G  1}  indicates  the  class  of  Xj. 

Conceptually,  the  support  vector  classiher  is  constructed  in  a  two  step  process.  In  the 
hrst  step,  the  Xj  are  transformed  via  a  mapping  <h  :  where  7Y  is  a  high  (possibly 

inhnite)  dimensional  Hilbert  space.  The  intuition  is  that  the  two  classes  are  more  easily 
separated  in  than  in  For  algorithmic  reasons,  <h  must  be  chosen  so  that  the  kernel 
operator  A;(x,  x')  =  (<h(x),  <l>(x'))->^  is  positive  dehnite.  This  allows  us  to  compute  inner 
products  in  Ti  without  explicitly  evaluating  $. 

In  the  second  step,  a  hyperplane  is  determined  in  the  induced  feature  space  according 
to  the  max-margin  principle.  In  the  case  where  the  two  classes  can  be  separated  by  a 
hyperplane,  the  SVM  finds  the  hyperplane  that  maximizes  the  distance  between  the  decision 
boundary  and  the  closest  point  to  the  boundary,  known  as  the  margin.  When  the  classes 
cannot  be  separated  by  a  hyperplane,  the  constraints  are  relaxed  through  the  introduction 
of  slack  variables  ^i.  If  Ci  >  0,  this  means  that  the  corresponding  x*  lies  inside  the  margin 
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and  is  called  a  margin  error.  If  w  G  and  6  G  M  are  the  normal  vector  and  affine  shift 
dehning  the  max-margin  hyperplane,  then  the  snpport  vector  classifier  is  given  by  /w,6(x)  = 
sgn((w,  <h(x))->^  -|-  b).  The  offset  parameter  b  is  often  called  the  bias. 

There  are  two  different  formulations  of  the  SVM.  The  original  SVM  [11],  which  we  shall 
call  the  C-SVM,  can  be  formulated  as  the  following  quadratic  program: 

1  ” 

{Pc)  min  -||w|p 


subject  to  yi{k{w,  Xj)  -|-  6)  >  1  — 

6>o 


for  i  =  1, . . . ,  n 
for  f  =  1, . . . ,  n 


where  C*  >  0  is  a  parameter  that  controls  the  tradeoff  between  minimizing  the  margin  errors 
and  maximizing  the  margin. 

For  computational  reasons,  it  is  often  easier  to  solve  {Pc)  by  solving  the  equivalent  dual 
problem: 


(Dc) 


min 

a 


^  n  n 

-  aiajViVjki^i,  x^)  -  a* 

ij=l  2=1 


subject  to  0  <  ttj  <  C  for  i  =  1, . . . ,  n 

n 

Y  OiiVi  =  0. 

i=l 

This  formulation  is  derived  by  forming  the  Lagrangian  (ck  is  a  Lagrange  multiplier).  The 
primal  and  the  dual  are  related  through  w  =  We  will  often  have  that  ctj  =  0 

for  most  Xj.  We  call  the  Xj  for  which  7^  0  the  support  vectors. 

An  alternative  (but  equivalent)  formulation  of  the  C-SVM  is  the  Z/-SVM  [12],  which 
replaces  C  with  a  different  parameter  u  G  [0, 1]  that  serves  as  an  upper  bound  on  the 
fraction  of  margin  errors  and  a  lower  bound  on  the  fraction  of  support  vectors.  The  Z/-SVM 
has  the  primal  formulation 

1  1  ” 

{Pu)  min  x||w|p  -  z/p+ 


subject  to 


l/i(/c(w,Xi)  +  b)>  p-^i 

e*>o 

P  >  0 


for  f  =  1, . . . ,  n 
for  f  =  1, . . . ,  n 
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and  dual  formulation 


(D.) 


mm 

a 


^  ^  CX-iCyjViyjkipi-ii  Xj 


*J=1 


subject  to  0  <  ttj  <  —  for  i  =  1, . . . ,  n 

n 

n  n 

i=l  i=\ 

3  Cost-Sensitive  SVMs 

The  above  formulations  implicitly  penalize  errors  in  both  classes  equally.  However,  as  de¬ 
scribed  in  the  Introduction,  there  may  be  different  costs  associated  with  the  two  different 
kinds  of  errors.  To  address  this  issue,  cost-sensitive  extensions  of  both  the  C-SVM  and  the 
Z/-SVM  have  been  proposed,  which  we  shall  denote  the  2C'-SVM  and  the  2z/-SVM  respec¬ 
tively. 

First  we  will  consider  the  2C-SVM  proposed  in  [1].  Let  /+  =  {i  :  ?/*  =  -1-1}  and  J_  =  {i  : 
Hi  =  —1}.  The  2C-SVM  has  primal 

(Pac)  min  ^|| w|p  +  (Ty  ^  +  ^(1  -  7)  ^ 

W,6,C  ^  V-  7-  V-  7- 

’  2€i+  — 


subject  to 


and  dual 
{D^c) 


6>o 

P  >  0 


for  i  =  1, . . . ,  n 
for  i  =  1, . . . ,  n 


^  n  n 

imn  -  ^  aiajyiyjk{-Ki,  x^)  -  ^ 

2,7  =  1  2  =  1 


a.- 


subject  to  0  <  <  Cj  for  i  G  /+ 

0  <  cti  <  C(1  —  7)  for  i  G  /_ 

n 

X]  ttiPi  =  0 
2=1 
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where  C  >  0  and  7  G  [0, 1].  Similarly,  [5]  proposed  the  2z/-SVM  as  a  cost-sensitive  extension 
of  the  Z/-SVM.  The  2z/-SVM  has  primal 


(ft.) 


mill  + 


7 


n 


+ 

ie/+ 


1-7 

n 


subject  to 


and  dual 

(ft.) 


subject  to 


yi{k{w,yii)  +  b)>  p-^i 

e*>o 

P  >  0 


for  i  =  1, . . . ,  n 
for  i  =  1, . . . ,  n 


mm  - 
ct  2 


aiajyiyjk{yii,Xj 


i,j=^ 


0  <  tti  < 
0  <  tti  < 


7 

n 

1-7 

n 


n  n 

ctiPi  =  0,  y^^ai>v 

i=\  i=l 


for  i  G  /+ 
for  i  ^  I_ 


where  u  G  [0,  |]  and  7  G  [0, 1]. 


4  Properties  of  the  2z/-SVM 

Before  illustrating  the  relationship  between  the  2C'-SVM  and  the  2z/-SVM,  we  establish  some 
of  the  basic  properties  of  the  2z/-SVM. 

Proposition  1.  Fix  7  G  [0, 1]  and  let  n+  =  |/+|,  n_  =  |/_|.  Then  (D2u)  is  feasible  if  and 
only  if  v  <  ftmax  <  where 


2min(7n+,  (1  —  7)n_) 

^max  • 

n 

Proof.  First,  assume  that  v  <  ftmax-  Then  we  can  construct  an  a  that  satishes  the  constraints 
of  Specihcally,  let 


^max 


min(7,  (1 -7)n_/n+)  ^ 

-  <  —  for  ?  G 

n  n 
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and 


^max 


min(7n+/n-,l-7)  ^  ^  ^  ^ 

n  ~  n 


Then  J2i£i+  '^i  +  SiG/+  =  ^max  >  and  Y^^=i  ^iVi  —  0-  Thus  we  have  a  feasible  solution, 
and  so  (-D21/)  is  feasible. 

Now  assume  that  {D2v)  is  feasible.  Then  there  exists  an  ol  such  that  •a*  >  ^ 

and  Ylii£i+  Combining  this  we  get  z/  <  2  Since  we  also  have 

0  <  CTi  <  7/n  for  i  G  /+,  we  see  that  z/  <  Oj  <  27n+/n,  and  therefore,  z/  <  27n+/n. 

Similarly,  u  <  2(1  —  7)n_/n.  Thus  z/  <  z/max- 

Finally,  note  that  z/max  is  maximized  when  7n+  =  (1— 7)n_,  which  occurs  when  7  =  n_/n, 
and  thus 

^  2n_n+  ^  1 

^max  ^  n  ^  X 


Remark.  IFe  can  use  this  result  to  show  that  (D2u)  is  feasible  for  fixed  v  G  [0,  z/max]  */  and 
only  if 

un  vn 

-  <  7  <  1 - 

2n+  2n_ 

Proposition  2.  Fix  7  G  [0,1]  and  v  G  [0,  z/maxj-  There  is  at  least  one  optimal  solution  of 
(D2u)  that  satisfies  addition,  if  the  optimal  objective  value  of  (D2u)  is  not 

zero,  all  optimal  solutions  of  (D2u)  satisfy  X]r=i  ~ 

Proof.  This  proposition  was  proved  in  [13]  for  {D,^).  The  proof  relies  only  on  the  form  of  the 
objective  function  of  {D,y),  which  is  identical  to  that  of  {D2u).  Thus,  we  omit  it  for  the  sake 
of  brevity  and  refer  the  reader  to  [13].  □ 

Remark.  The  cost- sensitive  extension  of  the  2z/-S'FM  proposed  in  [5]  is  parameterized  in  a 
different  manner  than  (D2u).  Specifically,  instead  of  parameters  v  andj,  (D2u)  is  formulated 
using  z/+  and  z/_,  where 

z/_n_  z/n 

z/+n+  +  z/_n_  2z/+n+ 

z/n 

“  2(1  -7)n_' 

This  parametrization  has  the  benefit  that  z/+  and  z/_  have  a  more  intuitive  meaning  illustrated 
by  the  following  result. 

Proposition  3.  Suppose  that  the  optimal  objective  value  of  (D2u)  is  not  zero.  Then  for  the 
optimal  solution  of  (D2u): 

1.  is  an  upper  bound  on  the  fraction  of  margin  errors  from  class  +1. 


z/  = 


2z/+z/_n+n_ 
(z/+n+  +  z/_n_)n’ 


or  eguivalently 


z/n 


Z/+  = 


27n+’ 
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2.  z/_  is  an  upper  bound  on  the  fraetion  of  margin  errors  from  class  —1. 

3.  z/_|_  is  a  lower  bound  on  the  fraction  of  support  vectors  from  class  +1. 

4-  V-  is  a  lower  bound  on  the  fraction  of  support  vectors  from  class  —1. 

Proof.  See  [5]  for  the  proof. 

Proposition  4.  (D2u)  is  feasible  if  and  only  if  i'+  <1  and  z/_  <  1. 

Proof.  From  Proposition  1  we  have  that  {D2v)  is  feasible  if  and  only  if 

^  2  min(7n+,  (1  —  7)n_) 

“  n 

Thus,  (D2u)  is  feasible  if  and  only  if 

Ornin  (  u+n+n.  \ 

2z/_|_Z/_7r-|-?r_  1  !/+n++i/_n_  ’  v+n^+u-n-  J 

- - —  <  - ^ ^  7=^  z/+z/_  <  min(z/_,z/+) 

(z/+n+  +  V-U-jn  n 

or  equivalently,  <  1  and  z/_  <  1. 


□ 


□ 


5  Relationship  between  the  2z/-SVM  and  2C'-SVM 


The  following  theorems  illustrate  the  relationship  between  {D2c)  and  {D2v).  The  hrst  shows 
how  solutions  of  (T*2c)  are  related  to  solutions  of  {D2u),  and  the  second  shows  how  solutions 
of  {D2u)  are  related  to  solutions  of  (T>2c)-  The  third  theorem,  our  main  result,  shows  that 
increasing  (decreasing)  u  is  similar  to  decreasing  (increasing)  C.  The  proofs  of  these  theorems 
can  be  found  in  Section  7. 


Theorem  1.  Fix  7  G  [0, 1].  For  each  C  >  0,  let  cx*  be  any  optimal  solution  of  (D2c)  and 
set  V  =  Q;*/(C'n).  Then  any  <x  is  an  optimal  solution  of  (D2c)  if  and  only  if  a/{Cn) 

is  an  optimal  solution  of  (D2u)- 

Theorem  2.  Fix  7  G  [0, 1].  Assume  (D2u),  0  <  z/  <  z/max?  has  a  nonzero  optimal  objective 
value,  then  p  >  0.  Set  C  =  l/{pn).  Then  any  ct  is  an  optimal  solution  of  (D2c)  if  and  only 
if  cx/{Cn)  is  an  optimal  solution  of  (D2u)- 

Theorem  3.  Fix  7  G  [0, 1]  and  let  a*  be  any  optimal  solution  of  (D2c)-  Define 


V  a* 
z/*  =  lim  " 

c^oo  Cn 


and 


V”  a* 
u*  =  lim  ' 

c->o  Cn 
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Then  0  <  z/*  <  z/*  =  z/max  <  Thus,  for  any  v  >  v* ,  (D2u)  is  infeasible.  For  any  v  G  (z/*,  z/*] 
the  optimal  objective  value  of  (D2u)  is  strictly  positive,  thus  there  exists  at  least  one  C  >  0 
such  that  any  a.  is  an  optimal  solution  of  (D2c)  if  and  only  if  ct/{Cn)  is  an  optimal  solution 
of  (D2u)-  For  any  z/  G  [0,  z/*],  (D2u)  is  feasible  with  zero  optimal  objective  value  (and  a  trivial 
solution). 

Remark.  Consider  the  case  where  the  data  can  be  perfectly  separated  by  a  hyperplane.  In 
this  case,  ifC^  oo,  margin  errors  are  penalized  more  heavily,  and  thus  for  some  sufficiently 
large  C ,  the  solution  of  (D2c)  will  be  the  o:*  corresponding  to  the  separating  hyperplane.  Thus 
there  exists  some  C*  such  that  a*  (corresponding  to  the  separating  hyperplane)  is  an  optimal 
solution  of  (D2c)  for  all  C  >  C* .  In  this  case,  as  C  ^  oo,  /Cn  — >  0,  and  thus 

z/*  =  0. 


Remark.  Using  the  definitions  of  z/_|_  and  z/_  in  Section  4,  it  is  easy  to  see  that  Theorem 
3  implies  that  if  j  is  fixed  and  we  let  C  — >  oo,  (D2c)  is  eguivalent  to  (D2u)  (in  the  sense 
described  above)  if  we  let 


Z/4 


z/*n 

27n+ 


>0, 


z/*n 


2(1  -7)n_ 


>  0. 


Similarly,  if  j  is  fixed  and  we  let  C  ^  0,  (D2c)  is  eguivalent  to  (D2u)  if  we  let 


^max^ 

27n+ 


min 


V  7^+  / 


^max^ 

2(1  -7)n_ 


min 


7^+  \ 

(1  -7)n_y  ■ 


6  Conclusion 

In  this  paper  we  have  reviewed  extensions  of  the  two  main  SVM  formulations.  These  exten¬ 
sions  address  the  practical  need  to  penalize  errors  from  the  two  classes  differently  in  many 
classihcation  tasks.  The  2(7-SVM  is  commonly  used  to  address  this  problem,  but  we  have 
proven  that  the  2z/-SVM  is  equivalent  to  the  2C'-SVM  in  a  certain  sense.  Additionally,  we 
have  shown  that  the  2z/-SVM  has  many  properties  that  make  it  an  attractive  alternative 
to  the  2C'-SVM.  Specihcally,  as  C  becomes  very  large  or  small,  numerical  implementations 
of  the  2C'-SVM  can  become  unstable.  Thus,  when  performing  parameter  estimation,  it  is 
typical  to  restrict  (7  to  a  range  of  possible  values.  However,  this  range  is  inevitably  ar¬ 
bitrary.  The  2z/-SVM  replaces  C  and  7  with  z/+  and  z/_.  These  parameters  have  a  more 
intuitive  meaning,  and  we  have  shown  that  the  2z/-SVM  has  a  feasible  solution  if  and  only 
if  (z/+,z/_)  G  [0,1]^.  Thus,  the  2z/-SVM  offers  a  much  more  natural  setting  for  parameter 
selection,  which  is  a  critical  issue  in  practical  applications. 


7  Proof  of  Theorems 


In  order  to  compare  (-Dc)  and  {Diy),  we  can  rescale  (-Dc)  (by  setting  a'  =  cx/Cn),  in  which 
case  we  obtain: 

1  n  1  ^ 

(D'c)  imn  aiajViVjki^i,  ^j)  “  ^ 

ij=l  2=1 

subject  to  0  <  ai  <  —  for  i  =  1, . . .  ,n 

n 

n 

aiVi  =  0. 

i=l 

The  solutions  of  (-Dc)  and  {Dq)  have  a  simple  relationship:  a  is  a  solution  of  (-Dc)  if  and 
only  if  Q:/(Cn)  is  a  solution  of  Thus,  in  this  sense,  (-Dc)  and  are  equivalent. 

Furthermore,  notice  that  and  {D^)  differ  only  in  their  objective  functions  and  the 

additional  inequality  constraint  of  {D^).  In  [13]  this  similarity  was  exploited  to  establish  a 
detailed  relationship  between  {D^)  and  {Dq),  and  hence  between  {Diy)  and  {Dc). 

We  follow  a  similar  course  and  rescale  {D2c)  by  Cn  in  order  to  compare  it  with  {D2u)- 
This  gives  us: 


(Di 


2CJ 


min 

CK 


n 

-y 

2  ^ 


1 


n 

i=l 


7 

subject  to  0  <  cti  <  —  for  i  G  /+ 

n 

1  —  7 

0  <  <  -  for  i  E  I_ 

n 

n 

aiVi  =  0. 

Rather  than  proving  the  theorems  in  Section  5  directly,  we  will  take  advantage  of  the  rela¬ 
tionship  between  {D2c)  and  {D^q).  We  will  establish  equivalent  theorems  (which  we  denote 
Theorems  T,  2',  and  3')  relating  {D21)  and  {D',2c)i  which  are  then  trivially  extended  to  the 
theorems  stated  in  Section  5.  We  begin  by  proving  the  following  lemma: 

Lemma  1.  Fix  7  G  [0, 1]  for  both  (D'^q)  and  (D2u)-  Assume  (D'2q)  and  (D2v)  share  one 
optimal  solution  a*  with  =  ^-  Then  any  a  is  an  optimal  solution  of  (D2q)  if  and 

only  if  it  is  an  optimal  solution  of  (D2u)- 
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Proof.  The  analogue  of  this  lemma  for  {D'^)  and  {D^y)  is  proved  in  [13].  The  proof  depends 
only  on  the  form  of  the  objective  functions  (specihcally  not  taking  the  constraints  into 
account)  and  on  the  analogue  of  Proposition  2.  Since  the  objective  function  of  {D^)  is 
identical  to  that  of  {D2u)  and  the  objective  function  of  is  also  identical  to  that  of 

{D2C'),  we  refer  the  reader  to  [13]  and  omit  the  proof.  □ 

For  the  proofs  of  Theorems  1'  and  2',  we  will  need  to  employ  the  Karush-Kuhn-Tucker 
(KKT)  conditions.  Essentially,  the  KKT  conditions  are  necessary  and  sufficient  conditions 
for  a  to  be  an  optimal  solution  to  our  optimization  problem.  Specihcally,  o:  is  an  optimal 
solution  of  {D2c)  if  and  only  if  there  exist  6  G  M  and  A,  ^  G  satisfying  the  KKT  conditions: 

n  ^ 

^  ajyiyjk{yii,  x^)  -  —  +  hyi  =  \i-ii 

A*ai  =  0,  A,  >0,  ei>0 

fi  -  tti)  =  0,  0  <  CTi  <  - 

\n  /  n 

ii  ( =  0,  0  <  ai  < 

\  n  J  n 

n 

^  ^  0* 

i=l 

Similarly,  a  is  an  optimal  solution  of  (-D21/)  if  only  if  there  exist  6,  p  G  M  and  A,  ^  G 
satisfying  the  slightly  different  KKT  conditions: 

n 

y]  ajyiyjki^i,  x^)  -  p  +  byi  =  \i  -  fi 
j=i 

\,ai  =  0,  A,  >0,  ei>0 

^i(--a,)  =0,  0<ai<^ 

\n  /  n 

=0,  0<a,  < 

\  n  J  n 

n  n  /  ^  \ 

i=l  i=l  \i=l  / 

Notice  that  the  two  sets  of  conditions  are  mostly  identical,  except  for  the  hrst  and  last  two 
of  the  conditions  for  {D2u)-  Using  this  observation,  we  can  prove  equivalent  versions  of  the 
hrst  two  theorems. 

Theorem  1'.  Fix  7  G  [0, 1].  For  each  C  >  0,  let  a*  be  any  optimal  solution  of  (D^c)  and 
set  V  =  Xr=i  Then  any  cx  is  an  optimal  solution  of  (D2q)  if  and  only  if  it  is  an  optimal 
solution  of  (D2v). 


for  i 

=  l,...,n 

(6) 

for  i 

=  l,...,n 

(7) 

for  i 

e  /+ 

(8) 

for  i 

G  /_ 

(9) 

(10) 

for  i  =  1, . . . ,  n 

(1) 

for  i  =  1, . . . ,  n 

(2) 

for  i  G  /+ 

(3) 

for  i  G  /_ 

(4) 

(5) 
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Proof.  If  a*  is  an  optimal  solution  of  (-D^c)  fhen  it  satisfies  the  KKT  conditions  for 
By  setting  z/  =  P  —  we  see  that  a.*  also  satisfies  the  KKT  conditions 

for  {D2u)  and  thus  is  an  optimal  solution  of  {D2u)-  From  Lemma  1  we  thus  have  that,  for 
any  a,  a  is  an  optimal  solution  of  (-D^c)  only  if  it  is  an  optimal  solution  of  {D2v)-  □ 

Theorem  2'.  Fix  7  G  [0, 1].  Assume  (D2u),  0  <  z/  <  z/^iav,  has  a  nonzero  optimal  objective 
value,  then  p  >  0.  Set  C  =  l/(pn).  Then  any  cx  is  an  optimal  solution  of  (D2c)  if  and  only 
if  it  is  an  optimal  solution  of  (D2v). 

Proof.  If  a*  is  an  optimal  solution  of  {D2u)  then  it  satishes  the  KKT  conditions  for  {D2u)- 
From  the  KKT  conditions  we  have 

n  /  n  \  n 

^i)  -  P  +  byA  a*  =  ^(Ai  -  fi)a* 

i=i  \j=i  /  i=i 

which,  by  applying  the  remaining  KKT  conditions,  reduces  to 

n  n  n 

(yfa]yiyjk{p^i,  x^)  -  -  ^ 

2^  =  1  2=1  2=1 

By  assumption,  {D2u)  has  a  nonzero  optimal  objective  value.  Thus  from  Proposition  2  we 
have  that  ~  have 


p  =  7  1  ^  a*a*yiyjk{p^i,^j)  + 


V 


>  0. 


2=1 


Thus  we  can  choose  C*  >  0  such  that  C  =  l/(pn)  and  (x*  is  a  KKT  point  of  (T*2c)-  Thus 
from  Lemma  1  any  a  is  an  optimal  solution  of  (L*2c)  T  and  only  if  it  is  an  optimal  solution 
of  (02.) .  □ 

We  will  need  the  following  lemmas  to  prove  Theorem  3'. 


Lemma  2.  Fix  7  G  [0, 1]  and  let  ct*  be  an  optimal  solution  of  (D2q).  Define  v  =  a*.  If 

the  optimal  objective  value  of  (D2u)  is  zero,  then  v  =  z/max  and  any  a  is  an  optimal  solution 
of  (D2u)  if  and  only  if  it  is  an  optimal  solution  for  all  (D^),  C  >  0. 

Proof.  By  setting  p  =  1/Cn,  oc*  is  a  KKT  point  of  {D2u)-  Therefore,  if  the  objective  function 
of  {D2u)  is  zero,  =  0-  Since  P  is  a  positive  dehnite  kernel,  we 

also  have  x^)  =  0.  In  this  case,  (1)  of  KKT  conditions  becomes 


1 


+  byi 


\i  -  fi  for 
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or 


Assume  that  6  >  0,  then 


Ai  -  6  <  0  for 


for  i  G  /+ 
for  i  &  I- . 

i  e 


This  implies  that  >  0  for  all  i  &  I-  since  both  Aj  and  are  nonnegative.  Therefore,  in 
order  for  the  KKT  condition  .^*((1  —  ■y)/n  —  a*)  =  0  to  hold,  we  must  have  a*  =  (1  —  7)/n  for 
all  i  E  I- .  From  condition  (5)  we  have  that  =  E*e/_  a*,  thus  we  need  EiG/+  = 

(1  —  7)n_/n  <  7n+/n.  Therefore,  if  (1  —  7)n_  >  7n+  then  we  have  a  contradiction,  and  it 
must  be  that  6  <  0. 

However,  assume  without  loss  of  generality  that  (1  —  7)n_  <  7n+,  in  which  case  b  >  0 
and  a*  =  (1  —  7)/n  for  all  i  E  I- .  There  are  three  possibilities  for  i  E  I+\ 

1.  Ai  -  e*  <  0 

2.  Ai  -  >  0 

3.  \i-ii  =  0. 


In  case  1,  where  \i  —  <  0,  we  have  that  >  0  for  all  i  E  /+.  For  the  KKT  condition 

/ n  —  ai)  =  0  to  hold,  we  need  a*  =  7/n  for  all  i  E  I+.  The  reqnirement  that  Eig/+  — 

*^1  =  (f  “  1)/^)  i  E  I-  imply  that  EILi  ~  2n+7/n  = 

2n_(l  —  7)/n  =  z/max-  Furthermore,  the  objective  function  for  (T*2c)  case  becomes 


min 

OL 


i=l 


which  is  clearly  minimized  by  oc*  (in  which  case  Y17=i  ~  ^max)  for  all  C  >  0,  thns  oc*  is 

an  optimal  solntion  of  (T*2c)  C  >  0.  By  Lemma  1,  any  a  is  an  optimal  solntion  of 

{D2u)  if  and  only  if  it  is  an  optimal  solntion  for  all  (£*20)5  C  >  0. 

In  case  2,  where  Xi  —  >  0,  we  have  that  A*  >  0  for  all  i  E  I- .  For  the  KKT  condition 
AjO;*  =  0  to  hold,  we  need  a*  =  0  for  all  i  E  /+.  However,  the  reqnirement  that  Eje/+  ~ 
EiG/-  a*  and  the  fact  that  a*  =  {1  —  7)/n  for  all  i  E  I-  lead  to  a  contradiction  if  /_  is 
nonempty.  Hence  all  the  training  vectors  are  in  the  same  class,  and  a*  =  0  for  all  i.  Thus, 
Er=i  ^max-  Furthermore,  if  all  the  data  are  from  the  same  class  then  a*  =  0  is  an 

optimal  solntion  of  (T*2c)  C  >  0.  Thus,  by  Lemma  1,  any  a  is  an  optimal  solution  of 

{D2u)  if  and  only  if  it  is  an  optimal  solution  for  all  (T*2c))  C  >  0. 

In  case  3,  where  Xi  —  =  0,  we  have  that  either  A*  =  7^  0  or  A*  =  =  0  for  each 

i  E  I+.  However,  A*  =  7^  0  leads  to  a  contradiction  because  the  KKT  conditions  would 

require  both  a*  =  0  and  a*  =  7/n.  Thus,  A*  =  =  0  and  the  KKT  conditions  involving 
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Aj  and  impose  no  conditions  on  a*  for  i  G  /+.  Since  a*  =  {1  —  7)/n  for  all  i  G  and 
(1  -  7)n_  <  7n+,  we  have  a*  =  “i  =  (1  “  7)«+/^-  Thus,  E*=i  <  =  ^max- 

Furthermore,  by  setting  b  =  l/(C'n),  a*  is  an  optimal  solution  of  (-D^)  C*  >  0.  Thus, 

by  Lemma  1,  any  a  is  an  optimal  solution  of  {D2u)  if  and  only  if  it  is  an  optimal  solution 
for  all  (-DE))  T"  >  0.  □ 

Lemma  3.  Assume  a*  is  any  optimal  solution  of  (D2q),  then  EILi  ®  continuous 

decreasing  function  of  C  on  (0,  cx)). 

Proof.  Again,  the  analogue  of  this  lemma  for  {D'fj)  is  proved  in  [13].  Since  the  proof  depends 
only  on  the  form  of  the  objective  function  and  the  analogues  of  Theorems  1'  and  2',  we  omit 
the  proof  and  refer  the  reader  to  [13].  □ 

Using  these  lemmas,  we  are  now  ready  to  prove  the  equivalent  of  the  main  theorem: 

Theorem  3'.  Fix  7  G  [0, 1]  and  let  a.*  be  any  optimal  solution  of  (D'^q).  Define 


n 


i=l 


and 


n 


i=l 


Then  0  <  z/*  <  z/*  =  z/max  <  Thus,  for  any  v  >  v* ,  (D2u)  is  infeasible.  For  any 
V  G  (z/*,  z/*],  the  optimal  objective  value  of  (D2v)  is  strictly  positive,  thus  there  exists  at  least 
one  C  >  0  such  that  any  ct  is  an  optimal  solution  of  (D2q)  if  and  only  if  it  is  an  optimal 
solution  of  (D2u).  For  any  v  G  [0,  z/*],  {D2v)  is  feasible  with  zero  optimal  objective  value  (and 
a  trivial  solution). 


Proof.  From  Lemma  3  and  the  fact  that  0  <  EILi  —  ^max  we  know  that  the  above  limits 
exist  and  can  be  dehned  without  any  problems. 

For  the  any  optimal  solution  of  have  that  the  KKT  condition  (1)  holds: 


ajyiyjk{yii,  xj)  -  —  +  &  =  A*  -  fi  for  i  G  1+ 
i=i 


ajyiyjk{-Ki,  x^)  -  —  -  6  =  A*  -  fi  for  i  G 

Assume  that  6  >  0.  Since  cx*  is  bounded,  when  C  is  sufficiently  small,  we  will  have  A*  — <  0 
for  i  G  /+,  thus  >  0  and  from  the  KKT  conditions,  a*  =  7/n  for  all  i  G  /+.  If  'yn+/n  > 
(1  —  7)n_/n,  then  this  cx*  is  feasible  and  Er=i  ~  ^max-  However,  if  'yn+/n  <  (1  —  7)n_/n 
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then  we  have  a  contradtiction,  and  thus  6  <  0.  In  this  case,  for  C  sufficiently  small,  Xi—^i  <  0 
for  i  G  Jj.  As  above,  this  implies  that  a*  =  (1  — 7)/n  for  all  i  G  and  thus  —  ^max- 

Hence,  v*  =  X]r=i  ~  ^max-  In  this  case,  from  Proposition  1  we  immediately  know  that 
{D2u)  is  infeasible  if  z/  >  z/*. 

From  Proposition  1,  we  know  that  for  all  v  <  v*  (-D21/)  is  feasible.  From  Lemma  3  we 
know  that  ^X  is  a  continuous  decreasing  function.  Thus  for  any  z/  G  (z/*,  z/*],  there  is  a 
C*  >  0  such  that  ^X  ~  and  any  ol  is  an  optimal  solution  of  {D2u)  if  and  only  if  it  is 

an  optimal  solution  for  (F*2c)- 

If  z/  <  z/*,  {D2v)  must  have  an  optimal  objective  value  of  zero  because  of  Theorem  2  and 
the  dehnition  of  z/*.  If  z/  =  z/*  =  0,  the  optimal  objective  value  of  {D2v)  is  zero,  as  ck*  =  0 
is  a  feasible  solution.  If  z/  =  z/*  >  0,  the  fact  that  feasible  regions  of  (T*2i/)  are  bounded 
by  0  <  a*  <  7/n  for  i  G  /+  and  0  <  0;^  <  (1  —  7)/n  for  i  G  /_,  and  Proposition  2  imply 
that  there  exists  a  sequence  z/i  <  z/2  <  ■  ■  ■  <  z/*  such  that  is  an  optimal  solution 

of  {D2u)  with  z/  =  Uj,  ~  ^31  ^  ~  cx'"^  exists.  Since  ~  ^31 

—  linii^j^!^*  Sr=i also  have  that  0  <  dj  <  7/n  for  i  G  /+,  0  <  dj  < 
(1  —  7)/n  for  i  G  and  X]r=i  ~  linii/j^i/.  Vi  Ym=i  =  0  so  d  is  feasible  to  (T*2i/)  for 
V  =  z/*.  However,  Xl”m=i  “;“ml/zl/m/^(xz,  x^)  =  lim^.^,,^  x^)  =  0  as 

YXAm=i'^'i^'^'^yiymk{^i,^m)  =  0  for  all  Vj.  Therefore  the  optimal  objective  value  of  (D2u)  is 
zero  if  z/  =  z/*. 

Finally,  from  the  above  discussion,  if  z/  <  z/*,  the  objective  value  of  (T*2i/)  is  zero.  If  the 
objective  value  of  {D2y)  is  zero  but  v  >  z/*,  then  by  Lemma  3  there  is  a  C*  >  0  such  that,  if 
CK*  is  an  optimal  solution  of  (£*20)5  SILi  ^X  ~  Thus,  from  Lemma  2,  we  have  that 
^  =  z^max  =  z/*  <  z/*,  a  contradiction.  Thus  the  objective  value  of  {D2v)  is  zero  if  and  only  if 
13  <13^.  In  this  case,  w  =  0  and  we  say  that  the  solution  is  trivial.  □ 
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